Large Corpus-based Semantic Feature Extraction for Pronoun Coreference
نویسندگان
چکیده
Semantic information is a very important factor in coreference resolution. The combination of large corpora and ‘deep’ analysis procedures has made it possible to acquire a range of semantic information and apply it to this task. In this paper, we generate two statistically-based semantic features from a large corpus and measure their influence on pronoun coreference. One is contextual compatibility, which decides if the antecedent can be used in the anaphor’s context; the other is role pair, which decides if the actions asserted of the antecedent and the anaphor are likely to apply to the same entity. We apply a semantic labeling system and a baseline coreference system to a large corpus to generate semantic patterns and convert them into features in a MaxEnt model. These features produce an absolute gain of 1.5% to 1.7% in resolution accuracy (a 6% reduction in errors). To understand the limitations of these features, we also extract patterns from the test corpus, use these patterns to train a coreference model, and examine some of the cases where coreference still fails. We also compare the performance of patterns extracted from semantic role labeling and syntax.
منابع مشابه
Corpus based coreference resolution for Farsi text
"Coreference resolution" or "finding all expressions that refer to the same entity" in a text, is one of the important requirements in natural language processing. Two words are coreference when both refer to a single entity in the text or the real world. So the main task of coreference resolution systems is to identify terms that refer to a unique entity. A coreference resolution tool could be...
متن کاملExploring Local and Global Semantic Information for Event Pronoun Resolution
Event anaphora resolution plays a critical role in discourse analysis. This paper focuses on improving event pronoun resolution using both local and global semantic information. In particular, a predicate-argument structure is proposed to represent the local semantic information about an event while the global semantic information is represented by the entity coreference chains related with var...
متن کاملCorefrence resolution with deep learning in the Persian Labnguage
Coreference resolution is an advanced issue in natural language processing. Nowadays, due to the extension of social networks, TV channels, news agencies, the Internet, etc. in human life, reading all the contents, analyzing them, and finding a relation between them require time and cost. In the present era, text analysis is performed using various natural language processing techniques, one ...
متن کاملQuantitative Evaluation of Coreference Algorithms in anInformation Extraction
Algorithms for performing coreference resolution can only be precisely evaluated given a benchmark corpus of coreference-annotated texts, together with techniques for evaluating the algorithms' output against the corpus. Such a corpus and such techniques have become available for the rst time as part of the Message Understanding Conference 6 (MUC-6) evaluations of information extraction systems...
متن کاملParCor 1.0: A Parallel Pronoun-Coreference Corpus to Support Statistical MT
We present ParCor, a parallel corpus of texts in which pronoun coreference – reduced coreference in which pronouns are used as referring expressions – has been annotated. The corpus is intended to be used both as a resource from which to learn systematic differences in pronoun use between languages and ultimately for developing and testing informed Statistical Machine Translation systems aimed ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2010